Multiple imputation with compatibility for high-dimensional data
نویسندگان
چکیده
Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible the analysis leading to inconsistent and biased estimates. Although compatibility such cases may not achieved, but one obtain consistent unbiased estimates using a semi-compatible model. We propose relax lasso penalty for selecting large set variables (at most n). substantive that also uses formal variable selection procedure high-dimensional structures then expected nested this resulting will probability. likelihood unstable face convergence issues as becomes nearly sample size. To address these issues, we further use ridge obtaining posterior distribution parameters based on observed data. proposed technique compared standard MI software techniques available data simulation studies real life dataset. Our results exhibit superiority approach existing approaches while addressing issue.
منابع مشابه
Multiple imputation and analysis for high‐dimensional incomplete proteomics data
Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the maj...
متن کاملMultiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data
Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation b...
متن کاملMultiple imputation and random forests (MIRF) for unobservable, high-dimensional data.
Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about mea...
متن کاملMultiple Imputation for Missing Data
Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin’s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard proc...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: PLOS ONE
سال: 2021
ISSN: ['1932-6203']
DOI: https://doi.org/10.1371/journal.pone.0254112